The Role of Algorithm Bias vs Information Source in Learning Algorithms for Morphosyntactic Disambiguation
نویسندگان
چکیده
Morphosyntactic Disambiguation (Part of Speech tagging) is a useful benchmark problem for system comparison because it is typical for a large class of Natural Language Processing (NLP) problems that can be defined as disambiguation in local context. This paper adds to the literature on the systematic and objective evaluation of different methods to automatically learn this type of disambiguation problem. We systematically compare two inductive learning approaches to tagging: MXP O S T (based on maximum entropy modeling) and MBT (based on memory-based learning). We investigate the effect of different sources of information on accuracy when comparing the two approaches under the same conditions. Results indicate that earlier observed differences in accuracy can be at tr ibuted largely to differences in information sources used, rather than to algorithm bias. 1 C o m p a r i n g T a g g e r s Morphosyntactic Disambiguation (Part of Speech tagging) is concerned with assigning morpho-syntactic categories (tags) to words in a sentence, typically by employing a complex interaction of contextual and lexical clues to trigger the correct disambiguation. As a contextual clue, we might for instance assume that it is unlikely that a verb will follow an article. As a lexical (morphological) clue, we might assign a word like better the tag comparative if we notice that its suffix is er. POS tagging is a useful first step in text analysis, but also a prototypical benchmark task for the type of disambiguation problems which is paramount in natural language processing: assigning one of a set of possible labels to a linguistic object given different information sources derived from the linguistic context. Techniques working well in the area of POS tagging may also work well in a large range of other NLP problems such as word sense disambiguation and discourse segmentation, when reliable annotated corpora providing good predictive information sources for these problems become
منابع مشابه
Improvement of Routing Operation Based on Learning with Using Smart Local and Global Agents and with the Help of the Ant Colony Algorithm
Routing in computer networks has played a special role in recent years. The cause of this is the role of routing in a performance of the networks. The quality of service and security is one of the most important challenges in routing due to lack of reliable methods. Routers use routing algorithms to find the best route to a particular destination. When talking about the best path, we consider p...
متن کاملImprovement of Routing Operation Based on Learning with Using Smart Local and Global Agents and with the Help of the Ant Colony Algorithm
Routing in computer networks has played a special role in recent years. The cause of this is the role of routing in a performance of the networks. The quality of service and security is one of the most important challenges in routing due to lack of reliable methods. Routers use routing algorithms to find the best route to a particular destination. When talking about the best path, we consider p...
متن کاملResearch of Blind Signals Separation with Genetic Algorithm and Particle Swarm Optimization Based on Mutual Information
Blind source separation technique separates mixed signals blindly without any information on the mixing system. In this paper, we have used two evolutionary algorithms, namely, genetic algorithm and particle swarm optimization for blind source separation. In these techniques a novel fitness function that is based on the mutual information and high order statistics is proposed. In order to evalu...
متن کاملResearch of Blind Signals Separation with Genetic Algorithm and Particle Swarm Optimization Based on Mutual Information
Blind source separation technique separates mixed signals blindly without any information on the mixing system. In this paper, we have used two evolutionary algorithms, namely, genetic algorithm and particle swarm optimization for blind source separation. In these techniques a novel fitness function that is based on the mutual information and high order statistics is proposed. In order to evalu...
متن کاملOptimization of e-Learning Model Using Fuzzy Genetic Algorithm
E-learning model is examined of three major dimensions. And each dimension has a range of indicators that is effective in optimization and modeling, in many optimization problems in the modeling, target function or constraints may change over time that as a result optimization of these problems can also be changed. If any of these undetermined events be considered in the optimization process, t...
متن کامل